Since Jan. 1, 2015, The Washington Post has been compiling a database of every fatal shooting in the US by a police officer in the line of duty.

While there are many challenges regarding data collection and reporting, The Washington Post has been tracking more than a dozen details about each killing. This includes the race, age and gender of the deceased, whether the person was armed, and whether the victim was experiencing a mental-health crisis. The Washington Post has gathered this supplemental information from law enforcement websites, local new reports, social media, and by monitoring independent databases such as "Killed by police" and "Fatal Encounters". The Post has also conducted additional reporting in many cases.
There are 4 additional datasets: US census data on poverty rate, high school graduation rate, median household income, and racial demographics. Source of census data.
Run the cell below if you are working with Google Colab
%pip install --upgrade plotly
Requirement already satisfied: plotly in c:\users\shruti soni\appdata\local\programs\python\python310\lib\site-packages (5.5.0) Requirement already satisfied: six in c:\users\shruti soni\appdata\local\programs\python\python310\lib\site-packages (from plotly) (1.16.0) Requirement already satisfied: tenacity>=6.2.0 in c:\users\shruti soni\appdata\local\programs\python\python310\lib\site-packages (from plotly) (8.0.1) Note: you may need to restart the kernel to use updated packages.
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
# This might be helpful:
from collections import Counter
pd.options.display.float_format = '{:,.2f}'.format
df_hh_income = pd.read_csv('Median_Household_Income_2015.csv', encoding="windows-1252")
df_pct_poverty = pd.read_csv('Pct_People_Below_Poverty_Level.csv', encoding="windows-1252")
df_pct_completed_hs = pd.read_csv('Pct_Over_25_Completed_High_School.csv', encoding="windows-1252")
df_share_race_city = pd.read_csv('Share_of_Race_By_City.csv', encoding="windows-1252")
df_fatalities = pd.read_csv('Deaths_by_Police_US.csv', encoding="windows-1252")
#income
df_hh_income.shape
(29322, 3)
df_hh_income.columns
Index(['Geographic Area', 'City', 'Median Income'], dtype='object')
df_hh_income.isnull().sum()
Geographic Area 0 City 0 Median Income 51 dtype: int64
df_hh_income.duplicated().sum()
0
df_pct_poverty.shape
(29329, 3)
df_pct_poverty.columns
Index(['Geographic Area', 'City', 'poverty_rate'], dtype='object')
df_pct_poverty.isnull().sum()
Geographic Area 0 City 0 poverty_rate 0 dtype: int64
df_pct_poverty.duplicated().sum()
0
df_pct_completed_hs.shape
(29329, 3)
df_pct_completed_hs.columns
Index(['Geographic Area', 'City', 'percent_completed_hs'], dtype='object')
df_pct_completed_hs.isnull().sum()
Geographic Area 0 City 0 percent_completed_hs 0 dtype: int64
df_pct_completed_hs.duplicated().sum()
0
df_share_race_city.shape
(29268, 7)
df_share_race_city.columns
Index(['Geographic area', 'City', 'share_white', 'share_black',
'share_native_american', 'share_asian', 'share_hispanic'],
dtype='object')
df_share_race_city.isnull().sum()
Geographic area 0 City 0 share_white 0 share_black 0 share_native_american 0 share_asian 0 share_hispanic 0 dtype: int64
df_share_race_city.duplicated().sum()
0
df_fatalities.shape
(2535, 14)
df_fatalities.columns
Index(['id', 'name', 'date', 'manner_of_death', 'armed', 'age', 'gender',
'race', 'city', 'state', 'signs_of_mental_illness', 'threat_level',
'flee', 'body_camera'],
dtype='object')
df_fatalities.isnull().sum()
id 0 name 0 date 0 manner_of_death 0 armed 9 age 77 gender 0 race 195 city 0 state 0 signs_of_mental_illness 0 threat_level 0 flee 65 body_camera 0 dtype: int64
df_fatalities.duplicated().sum()
0
Consider how to deal with the NaN values. Perhaps substituting 0 is appropriate.
df_hh_income = pd.read_csv('Median_Household_Income_2015.csv', encoding="windows-1252")
df_pct_poverty = pd.read_csv('Pct_People_Below_Poverty_Level.csv', encoding="windows-1252")
df_pct_completed_hs = pd.read_csv('Pct_Over_25_Completed_High_School.csv', encoding="windows-1252")
df_share_race_city = pd.read_csv('Share_of_Race_By_City.csv', encoding="windows-1252")
df_fatalities = pd.read_csv('Deaths_by_Police_US.csv', encoding="windows-1252")
df_hh_income.fillna(value=0,inplace=True)
df_pct_poverty.fillna(value=0,inplace=True)
df_pct_completed_hs.fillna(value=0,inplace=True)
df_share_race_city.fillna(value=0,inplace=True)
df_fatalities .fillna(value=0,inplace=True)
Create a bar chart that ranks the poverty rate from highest to lowest by US state. Which state has the highest poverty rate? Which state has the lowest poverty rate? Bar Plot
df_pct_poverty.drop(df_pct_poverty[df_pct_poverty["poverty_rate"]=="-"].index, inplace = True)
df_pct_poverty["poverty_rate"]=pd.to_numeric(df_pct_poverty["poverty_rate"])
df_pct_poverty_fg = df_pct_poverty.groupby('Geographic Area')["poverty_rate"].mean().reset_index()
df_pct_poverty_fg
| Geographic Area | poverty_rate | |
|---|---|---|
| 0 | AK | 19.85 |
| 1 | AL | 20.65 |
| 2 | AR | 22.96 |
| 3 | AZ | 25.67 |
| 4 | CA | 17.12 |
| 5 | CO | 13.36 |
| 6 | CT | 9.14 |
| 7 | DC | 18.00 |
| 8 | DE | 12.56 |
| 9 | FL | 17.57 |
| 10 | GA | 23.78 |
| 11 | HI | 13.40 |
| 12 | IA | 12.29 |
| 13 | ID | 18.24 |
| 14 | IL | 13.88 |
| 15 | IN | 15.50 |
| 16 | KS | 14.76 |
| 17 | KY | 20.08 |
| 18 | LA | 22.34 |
| 19 | MA | 9.59 |
| 20 | MD | 10.31 |
| 21 | ME | 16.89 |
| 22 | MI | 17.90 |
| 23 | MN | 13.75 |
| 24 | MO | 20.11 |
| 25 | MS | 26.88 |
| 26 | MT | 16.51 |
| 27 | NC | 19.75 |
| 28 | ND | 12.16 |
| 29 | NE | 12.98 |
| 30 | NH | 12.66 |
| 31 | NJ | 8.19 |
| 32 | NM | 23.08 |
| 33 | NV | 12.47 |
| 34 | NY | 11.67 |
| 35 | OH | 14.85 |
| 36 | OK | 20.66 |
| 37 | OR | 16.52 |
| 38 | PA | 12.52 |
| 39 | RI | 10.37 |
| 40 | SC | 22.16 |
| 41 | SD | 16.03 |
| 42 | TN | 19.89 |
| 43 | TX | 19.92 |
| 44 | UT | 11.98 |
| 45 | VA | 14.59 |
| 46 | VT | 13.79 |
| 47 | WA | 15.02 |
| 48 | WI | 12.86 |
| 49 | WV | 21.13 |
| 50 | WY | 9.89 |
plt.figure(figsize=(12, 6))
sns.lineplot(x='Geographic Area', y='poverty_rate', data=df_pct_poverty)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=90)
plt.show()
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Show the High School Graduation Rate in ascending order of US States. Which state has the lowest high school graduation rate? Which state has the highest?
df_pct_completed_hs
| Geographic Area | City | percent_completed_hs | |
|---|---|---|---|
| 0 | AL | Abanda CDP | 21.2 |
| 1 | AL | Abbeville city | 69.1 |
| 2 | AL | Adamsville city | 78.9 |
| 3 | AL | Addison town | 81.4 |
| 4 | AL | Akron town | 68.6 |
| ... | ... | ... | ... |
| 29324 | WY | Woods Landing-Jelm CDP | 100 |
| 29325 | WY | Worland city | 85.6 |
| 29326 | WY | Wright town | 89.2 |
| 29327 | WY | Yoder town | 79.4 |
| 29328 | WY | Y-O Ranch CDP | 100 |
29329 rows × 3 columns
df_pct_completed_hs.drop(df_pct_completed_hs[df_pct_completed_hs["percent_completed_hs"]=="-"].index, inplace = True)
df_pct_completed_hs["pecent_completed_hs"]=pd.to_numeric(df_pct_completed_hs["percent_completed_hs"])
df_pct_completed_hs_fg= df_pct_completed_hs.groupby('Geographic Area')["percent_completed_hs"].mean().reset_index()
#We can observe that the average price has reduced
plt.figure(figsize=(12, 6))
sns.lineplot(x='Geographic Area', y='percent_completed_hs', data=df_pct_completed_hs_fg)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=90)
plt.show()
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
# create figure and axis objects with subplots()
from matplotlib.pyplot import figure
fig,ax = plt.subplots(figsize=(10,6))
plt.figure(figsize=(15, 6), dpi=80)
# make a plot
ax.plot(df_pct_completed_hs_fg['Geographic Area'],
df_pct_completed_hs_fg['percent_completed_hs'],color="red", marker="o")
# set x-axis label
ax.set_xlabel('States',fontsize=14)
# set y-axis label
ax.set_ylabel("Percent Completion High School",color="red",fontsize=14)
# twin object for two different y-axis on the sample plot
ax2=ax.twinx()
# make a plot with different y-axis using second axis object
ax2.plot(df_pct_poverty_fg['Geographic Area'],df_pct_poverty_fg['poverty_rate'],color="blue",marker="o")
ax2.set_ylabel("Poverty",color="blue",fontsize=14)
ax.tick_params(axis='x', which='major', labelsize=10,rotation=90)
plt.show()
<Figure size 1200x480 with 0 Axes>
sns.jointplot(x=df_pct_poverty_fg["poverty_rate"],y=df_pct_completed_hs_fg['percent_completed_hs'], color = 'blue')
plt.show()
data=pd.DataFrame([df_pct_poverty_fg["poverty_rate"],df_pct_completed_hs_fg['percent_completed_hs']])
data=data.T
data
| poverty_rate | percent_completed_hs | |
|---|---|---|
| 0 | 19.85 | 84.63 |
| 1 | 20.65 | 80.30 |
| 2 | 22.96 | 79.95 |
| 3 | 25.67 | 80.47 |
| 4 | 17.12 | 81.96 |
| 5 | 13.36 | 90.11 |
| 6 | 9.14 | 91.59 |
| 7 | 18.00 | 89.30 |
| 8 | 12.56 | 88.52 |
| 9 | 17.57 | 85.74 |
| 10 | 23.78 | 79.01 |
| 11 | 13.40 | 91.67 |
| 12 | 12.29 | 90.11 |
| 13 | 18.24 | 85.17 |
| 14 | 13.88 | 88.48 |
| 15 | 15.50 | 86.32 |
| 16 | 14.76 | 88.23 |
| 17 | 20.08 | 82.37 |
| 18 | 22.34 | 79.29 |
| 19 | 9.59 | 92.40 |
| 20 | 10.31 | 88.42 |
| 21 | 16.89 | 91.43 |
| 22 | 17.90 | 89.21 |
| 23 | 13.75 | 89.47 |
| 24 | 20.11 | 83.52 |
| 25 | 26.88 | 78.47 |
| 26 | 16.51 | 90.49 |
| 27 | 19.75 | 83.25 |
| 28 | 12.16 | 87.82 |
| 29 | 12.98 | 89.99 |
| 30 | 12.66 | 90.71 |
| 31 | 8.19 | 90.85 |
| 32 | 23.08 | 80.98 |
| 33 | 12.47 | 87.72 |
| 34 | 11.67 | 90.61 |
| 35 | 14.85 | 88.34 |
| 36 | 20.66 | 82.91 |
| 37 | 16.52 | 88.30 |
| 38 | 12.52 | 89.02 |
| 39 | 10.37 | 88.82 |
| 40 | 22.16 | 80.85 |
| 41 | 16.03 | 87.75 |
| 42 | 19.89 | 81.63 |
| 43 | 19.92 | 75.69 |
| 44 | 11.98 | 91.62 |
| 45 | 14.59 | 84.88 |
| 46 | 13.79 | 89.98 |
| 47 | 15.02 | 88.20 |
| 48 | 12.86 | 90.26 |
| 49 | 21.13 | 82.35 |
| 50 | 9.89 | 92.10 |
.lmplot() or .regplot() to show a linear regression between the poverty ratio and the high school graduation ratio.¶sns.lmplot(x="poverty_rate",y='percent_completed_hs',data=data)
plt.show()
sns.regplot(x="poverty_rate",y='percent_completed_hs',data=data)
plt.show()
Visualise the share of the white, black, hispanic, asian and native american population in each US State using a bar chart with sub sections.
df_share_race_city = pd.read_csv('Share_of_Race_By_City.csv', encoding="windows-1252")
df_share_race_city
| Geographic area | City | share_white | share_black | share_native_american | share_asian | share_hispanic | |
|---|---|---|---|---|---|---|---|
| 0 | AL | Abanda CDP | 67.2 | 30.2 | 0 | 0 | 1.6 |
| 1 | AL | Abbeville city | 54.4 | 41.4 | 0.1 | 1 | 3.1 |
| 2 | AL | Adamsville city | 52.3 | 44.9 | 0.5 | 0.3 | 2.3 |
| 3 | AL | Addison town | 99.1 | 0.1 | 0 | 0.1 | 0.4 |
| 4 | AL | Akron town | 13.2 | 86.5 | 0 | 0 | 0.3 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 29263 | WY | Woods Landing-Jelm CDP | 95.9 | 0 | 0 | 2.1 | 0 |
| 29264 | WY | Worland city | 89.9 | 0.3 | 1.3 | 0.6 | 16.6 |
| 29265 | WY | Wright town | 94.5 | 0.1 | 1.4 | 0.2 | 6.2 |
| 29266 | WY | Yoder town | 97.4 | 0 | 0 | 0 | 4 |
| 29267 | WY | Y-O Ranch CDP | 92.8 | 1.5 | 2.6 | 0 | 11.8 |
29268 rows × 7 columns
df_share_race_city.drop(df_share_race_city[df_share_race_city["share_white"]=="(X)" ].index, inplace = False)
df_share_race_city["share_white"]=pd.to_numeric(df_share_race_city["share_white"])
df_share_race_city["share_black"]=pd.to_numeric(df_share_race_city["share_black"])
df_share_race_city["share_native_american"]=pd.to_numeric(df_share_race_city["share_native_american"])
df_share_race_city["share_asian"]=pd.to_numeric(df_share_race_city["share_asian"])
df_share_race_city["share_hispanic"]=pd.to_numeric(df_share_race_city["share_hispanic"])
df_share_race_city_fg= df_share_race_city.groupby(['Geographic area'])
df_share_race_city
| Geographic area | City | share_white | share_black | share_native_american | share_asian | share_hispanic | |
|---|---|---|---|---|---|---|---|
| 0 | AL | Abanda CDP | 67.2 | 30.2 | 0 | 0 | 1.6 |
| 1 | AL | Abbeville city | 54.4 | 41.4 | 0.1 | 1 | 3.1 |
| 2 | AL | Adamsville city | 52.3 | 44.9 | 0.5 | 0.3 | 2.3 |
| 3 | AL | Addison town | 99.1 | 0.1 | 0 | 0.1 | 0.4 |
| 4 | AL | Akron town | 13.2 | 86.5 | 0 | 0 | 0.3 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 29263 | WY | Woods Landing-Jelm CDP | 95.9 | 0 | 0 | 2.1 | 0 |
| 29264 | WY | Worland city | 89.9 | 0.3 | 1.3 | 0.6 | 16.6 |
| 29265 | WY | Wright town | 94.5 | 0.1 | 1.4 | 0.2 | 6.2 |
| 29266 | WY | Yoder town | 97.4 | 0 | 0 | 0 | 4 |
| 29267 | WY | Y-O Ranch CDP | 92.8 | 1.5 | 2.6 | 0 | 11.8 |
29268 rows × 7 columns
Hint: Use .value_counts()
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 02/01/15 | shot | gun | 53.00 | M | A | Shelton | WA | True | attack | Not fleeing | False |
| 1 | 4 | Lewis Lee Lembke | 02/01/15 | shot | gun | 47.00 | M | W | Aloha | OR | False | attack | Not fleeing | False |
| 2 | 5 | John Paul Quintero | 03/01/15 | shot and Tasered | unarmed | 23.00 | M | H | Wichita | KS | False | other | Not fleeing | False |
| 3 | 8 | Matthew Hoffman | 04/01/15 | shot | toy weapon | 32.00 | M | W | San Francisco | CA | True | attack | Not fleeing | False |
| 4 | 9 | Michael Rodriguez | 04/01/15 | shot | nail gun | 39.00 | M | H | Evans | CO | False | attack | Not fleeing | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2530 | 2822 | Rodney E. Jacobs | 28/07/17 | shot | gun | 31.00 | M | 0 | Kansas City | MO | False | attack | Not fleeing | False |
| 2531 | 2813 | TK TK | 28/07/17 | shot | vehicle | 0.00 | M | 0 | Albuquerque | NM | False | attack | Car | False |
| 2532 | 2818 | Dennis W. Robinson | 29/07/17 | shot | gun | 48.00 | M | 0 | Melba | ID | False | attack | Car | False |
| 2533 | 2817 | Isaiah Tucker | 31/07/17 | shot | vehicle | 28.00 | M | B | Oshkosh | WI | False | attack | Car | True |
| 2534 | 2815 | Dwayne Jeune | 31/07/17 | shot | knife | 32.00 | M | B | Brooklyn | NY | True | attack | Not fleeing | False |
2535 rows × 14 columns
values=df_fatalities["race"].value_counts()
values=dict(values)
values.values()
dict_values([1201, 618, 423, 195, 39, 31, 28])
import matplotlib.pyplot as plt
# given values
sizes = values.values()
# Setting labels for items in Chart
labels = values.keys()
# explosion
explode = (0.05, 0.05, 0.05, 0.05, 0.05)
# Pie Chart
plt.pie(sizes, labels=labels,
autopct='%1.1f%%', pctdistance=0.85)
# draw circle
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
# Adding Circle in Pie chart
fig.gca().add_artist(centre_circle)
# Adding Title of chart
plt.title('Death by Race')
# Add Legends
plt.legend(labels, loc="upper right")
# Displaing Chart
plt.show()
Use df_fatalities to illustrate how many more men are killed compared to women.
df_fatalities["gender"].value_counts()
M 2428 F 107 Name: gender, dtype: int64
Break out the data by gender using df_fatalities. Is there a difference between men and women in the manner of death?
df_fatalities_female=df_fatalities[df_fatalities["gender"]=="F"]
df_fatalities_male=df_fatalities[df_fatalities["gender"]=="M"]
fig = plt.figure(figsize =(10, 7))
ax = fig.add_axes([0, 0, 1, 1])
# Creating plot
data=[df_fatalities_female["age"],df_fatalities_male["age"]]
bp = ax.boxplot(data)
plt.show()
df_fatalities_female_fg=df_fatalities_female.groupby("manner_of_death")
df_fatalities_female_fg=pd.DataFrame(df_fatalities_female_fg["armed"].count()).reset_index().rename({'armed':'Count'})
df_fatalities_female_fg
| manner_of_death | armed | |
|---|---|---|
| 0 | shot | 102 |
| 1 | shot and Tasered | 5 |
df_fatalities_male_fg=df_fatalities_male.groupby("manner_of_death")
df_fatalities_male_fg=pd.DataFrame(df_fatalities_male_fg["armed"].count()).reset_index().rename({'armed':'Count'})
df_fatalities_male_fg
| manner_of_death | armed | |
|---|---|---|
| 0 | shot | 2261 |
| 1 | shot and Tasered | 167 |
In what percentage of police killings were people armed? Create chart that show what kind of weapon (if any) the deceased was carrying. How many of the people killed by police were armed with guns versus unarmed?
(df_fatalities[df_fatalities["armed"]=="unarmed"].count()/df_fatalities.count())*100
id 6.75 name 6.75 date 6.75 manner_of_death 6.75 armed 6.75 age 6.75 gender 6.75 race 6.75 city 6.75 state 6.75 signs_of_mental_illness 6.75 threat_level 6.75 flee 6.75 body_camera 6.75 dtype: float64
6.75% unarmed
df_fatalities_armed=df_fatalities.groupby("armed")
df_fatalities_armed=pd.DataFrame(df_fatalities_armed["city"].count()).reset_index()
df_fatalities_armed
| armed | city | |
|---|---|---|
| 0 | 0 | 9 |
| 1 | Taser | 9 |
| 2 | air conditioner | 1 |
| 3 | ax | 9 |
| 4 | baseball bat | 8 |
| ... | ... | ... |
| 64 | toy weapon | 104 |
| 65 | unarmed | 171 |
| 66 | undetermined | 117 |
| 67 | unknown weapon | 18 |
| 68 | vehicle | 177 |
69 rows × 2 columns
df_fatalities_armed=df_fatalities_armed[1:]
fig = plt.figure(figsize = (12, 6))
plt.bar(df_fatalities_armed["armed"], df_fatalities_armed["city"], color ='maroon',
width = 0.4)
plt.xlabel("Weapon")
plt.ylabel("Count")
plt.title("Weapon Carried by People")
plt.tick_params(axis='x', which='major', labelsize=9,rotation=90)
plt.show()
Work out what percentage of people killed were under 25 years old.
(df_fatalities[df_fatalities["age"]<25].count()/df_fatalities.count())*100
id 20.79 name 20.79 date 20.79 manner_of_death 20.79 armed 20.79 age 20.79 gender 20.79 race 20.79 city 20.79 state 20.79 signs_of_mental_illness 20.79 threat_level 20.79 flee 20.79 body_camera 20.79 dtype: float64
Create a histogram and KDE plot that shows the distribution of ages of the people killed by police.
sns.displot(df_fatalities, x="age", kind="kde")
<seaborn.axisgrid.FacetGrid at 0x16f2d6ed750>
df_fatalities_by_race=df_fatalities.groupby("race")
#df_fatalities_by_race=pd.DataFrame(df_fatalities_by_race)
df_fatalities_by_race.get_group("A")["age"]
0 53.00 69 35.00 155 35.00 160 28.00 265 28.00 303 39.00 337 32.00 468 60.00 685 59.00 686 31.00 766 44.00 899 25.00 945 39.00 987 0.00 1053 40.00 1203 38.00 1204 53.00 1293 41.00 1312 34.00 1322 28.00 1347 33.00 1480 27.00 1481 61.00 1560 18.00 1747 44.00 1790 45.00 1845 15.00 1883 26.00 1937 48.00 1991 37.00 1996 32.00 2052 41.00 2088 33.00 2198 45.00 2223 30.00 2235 33.00 2263 43.00 2367 18.00 2403 20.00 Name: age, dtype: float64
df_fatalities_dict=list(df_fatalities_by_race.groups.keys())
df_fatalities_dict=df_fatalities_dict[1:]
df_fatalities_dict
['A', 'B', 'H', 'N', 'O', 'W']
Create a seperate KDE plot for each race. Is there a difference between the distributions?
for i in range(len(df_fatalities_dict)):
sns.displot(df_fatalities_by_race.get_group(df_fatalities_dict[i]), x="age", kind="kde")
plt.title(df_fatalities_dict[i])
plt.show()
#in black younger people had more fatalities for Americans 30-60
Create a chart that shows the total number of people killed by race.
import matplotlib.pyplot as plt
# given values
sizes = values.values()
# Setting labels for items in Chart
labels = values.keys()
# explosion
explode = (0.05, 0.05, 0.05, 0.05, 0.05)
# Pie Chart
plt.pie(sizes, labels=labels,
autopct='%1.1f%%', pctdistance=0.85)
# draw circle
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
# Adding Circle in Pie chart
fig.gca().add_artist(centre_circle)
# Adding Title of chart
plt.title('Death by Race')
# Add Legends
plt.legend(labels, loc="upper right")
# Displaing Chart
plt.show()
df_fatalities
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 02/01/15 | shot | gun | 53.00 | M | A | Shelton | WA | True | attack | Not fleeing | False |
| 1 | 4 | Lewis Lee Lembke | 02/01/15 | shot | gun | 47.00 | M | W | Aloha | OR | False | attack | Not fleeing | False |
| 2 | 5 | John Paul Quintero | 03/01/15 | shot and Tasered | unarmed | 23.00 | M | H | Wichita | KS | False | other | Not fleeing | False |
| 3 | 8 | Matthew Hoffman | 04/01/15 | shot | toy weapon | 32.00 | M | W | San Francisco | CA | True | attack | Not fleeing | False |
| 4 | 9 | Michael Rodriguez | 04/01/15 | shot | nail gun | 39.00 | M | H | Evans | CO | False | attack | Not fleeing | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2530 | 2822 | Rodney E. Jacobs | 28/07/17 | shot | gun | 31.00 | M | 0 | Kansas City | MO | False | attack | Not fleeing | False |
| 2531 | 2813 | TK TK | 28/07/17 | shot | vehicle | 0.00 | M | 0 | Albuquerque | NM | False | attack | Car | False |
| 2532 | 2818 | Dennis W. Robinson | 29/07/17 | shot | gun | 48.00 | M | 0 | Melba | ID | False | attack | Car | False |
| 2533 | 2817 | Isaiah Tucker | 31/07/17 | shot | vehicle | 28.00 | M | B | Oshkosh | WI | False | attack | Car | True |
| 2534 | 2815 | Dwayne Jeune | 31/07/17 | shot | knife | 32.00 | M | B | Brooklyn | NY | True | attack | Not fleeing | False |
2535 rows × 14 columns
What percentage of people killed by police have been diagnosed with a mental illness?
(df_fatalities[df_fatalities["signs_of_mental_illness"]==True].count()/df_fatalities.count())*100
#24.97%
id 24.97 name 24.97 date 24.97 manner_of_death 24.97 armed 24.97 age 24.97 gender 24.97 race 24.97 city 24.97 state 24.97 signs_of_mental_illness 24.97 threat_level 24.97 flee 24.97 body_camera 24.97 dtype: float64
df_fatalities
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 02/01/15 | shot | gun | 53.00 | M | A | Shelton | WA | True | attack | Not fleeing | False |
| 1 | 4 | Lewis Lee Lembke | 02/01/15 | shot | gun | 47.00 | M | W | Aloha | OR | False | attack | Not fleeing | False |
| 2 | 5 | John Paul Quintero | 03/01/15 | shot and Tasered | unarmed | 23.00 | M | H | Wichita | KS | False | other | Not fleeing | False |
| 3 | 8 | Matthew Hoffman | 04/01/15 | shot | toy weapon | 32.00 | M | W | San Francisco | CA | True | attack | Not fleeing | False |
| 4 | 9 | Michael Rodriguez | 04/01/15 | shot | nail gun | 39.00 | M | H | Evans | CO | False | attack | Not fleeing | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2530 | 2822 | Rodney E. Jacobs | 28/07/17 | shot | gun | 31.00 | M | 0 | Kansas City | MO | False | attack | Not fleeing | False |
| 2531 | 2813 | TK TK | 28/07/17 | shot | vehicle | 0.00 | M | 0 | Albuquerque | NM | False | attack | Car | False |
| 2532 | 2818 | Dennis W. Robinson | 29/07/17 | shot | gun | 48.00 | M | 0 | Melba | ID | False | attack | Car | False |
| 2533 | 2817 | Isaiah Tucker | 31/07/17 | shot | vehicle | 28.00 | M | B | Oshkosh | WI | False | attack | Car | True |
| 2534 | 2815 | Dwayne Jeune | 31/07/17 | shot | knife | 32.00 | M | B | Brooklyn | NY | True | attack | Not fleeing | False |
2535 rows × 14 columns
Create a chart ranking the top 10 cities with the most police killings. Which cities are the most dangerous?
df_fatalities_cities=df_fatalities["city"].value_counts().head(10)
df_fatalities_cities
#Los Angeles is the most dangerous
Los Angeles 39 Phoenix 31 Houston 27 Chicago 25 Las Vegas 21 San Antonio 20 Columbus 19 Austin 18 Miami 18 St. Louis 15 Name: city, dtype: int64
Find the share of each race in the top 10 cities. Contrast this with the top 10 cities of police killings to work out the rate at which people are killed by race for each city.
df_fatalities_race_city=df_fatalities["race"].groupby(df_fatalities["city"]).value_counts().nlargest(10)
df_fatalities_race_city
city race Chicago B 21 Los Angeles H 19 Houston B 15 Austin W 13 Phoenix W 12 San Antonio H 12 Columbus B 11 Phoenix H 11 St. Louis B 11 Los Angeles B 10 Name: race, dtype: int64
df_fatalities_by_state=df_fatalities.groupby("state")
df_fatalities_by_state=pd.DataFrame(df_fatalities_by_state["armed"].count()).reset_index().rename({'armed':'Count'})
df_fatalities_by_state
| state | armed | |
|---|---|---|
| 0 | AK | 15 |
| 1 | AL | 50 |
| 2 | AR | 26 |
| 3 | AZ | 118 |
| 4 | CA | 424 |
| 5 | CO | 74 |
| 6 | CT | 9 |
| 7 | DC | 11 |
| 8 | DE | 8 |
| 9 | FL | 154 |
| 10 | GA | 70 |
| 11 | HI | 11 |
| 12 | IA | 12 |
| 13 | ID | 17 |
| 14 | IL | 62 |
| 15 | IN | 43 |
| 16 | KS | 24 |
| 17 | KY | 43 |
| 18 | LA | 57 |
| 19 | MA | 22 |
| 20 | MD | 38 |
| 21 | ME | 13 |
| 22 | MI | 37 |
| 23 | MN | 32 |
| 24 | MO | 64 |
| 25 | MS | 23 |
| 26 | MT | 11 |
| 27 | NC | 69 |
| 28 | ND | 4 |
| 29 | NE | 15 |
| 30 | NH | 7 |
| 31 | NJ | 35 |
| 32 | NM | 51 |
| 33 | NV | 42 |
| 34 | NY | 45 |
| 35 | OH | 79 |
| 36 | OK | 78 |
| 37 | OR | 38 |
| 38 | PA | 51 |
| 39 | RI | 2 |
| 40 | SC | 44 |
| 41 | SD | 10 |
| 42 | TN | 59 |
| 43 | TX | 225 |
| 44 | UT | 23 |
| 45 | VA | 47 |
| 46 | VT | 3 |
| 47 | WA | 62 |
| 48 | WI | 43 |
| 49 | WV | 27 |
| 50 | WY | 8 |
import plotly.express as px
fig = px.choropleth(locations=df_fatalities_by_state["state"], locationmode="USA-states", color=df_fatalities_by_state["armed"], scope="usa")
fig.show()
df_fatalities
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 02/01/15 | shot | gun | 53.00 | M | A | Shelton | WA | True | attack | Not fleeing | False |
| 1 | 4 | Lewis Lee Lembke | 02/01/15 | shot | gun | 47.00 | M | W | Aloha | OR | False | attack | Not fleeing | False |
| 2 | 5 | John Paul Quintero | 03/01/15 | shot and Tasered | unarmed | 23.00 | M | H | Wichita | KS | False | other | Not fleeing | False |
| 3 | 8 | Matthew Hoffman | 04/01/15 | shot | toy weapon | 32.00 | M | W | San Francisco | CA | True | attack | Not fleeing | False |
| 4 | 9 | Michael Rodriguez | 04/01/15 | shot | nail gun | 39.00 | M | H | Evans | CO | False | attack | Not fleeing | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2530 | 2822 | Rodney E. Jacobs | 28/07/17 | shot | gun | 31.00 | M | 0 | Kansas City | MO | False | attack | Not fleeing | False |
| 2531 | 2813 | TK TK | 28/07/17 | shot | vehicle | 0.00 | M | 0 | Albuquerque | NM | False | attack | Car | False |
| 2532 | 2818 | Dennis W. Robinson | 29/07/17 | shot | gun | 48.00 | M | 0 | Melba | ID | False | attack | Car | False |
| 2533 | 2817 | Isaiah Tucker | 31/07/17 | shot | vehicle | 28.00 | M | B | Oshkosh | WI | False | attack | Car | True |
| 2534 | 2815 | Dwayne Jeune | 31/07/17 | shot | knife | 32.00 | M | B | Brooklyn | NY | True | attack | Not fleeing | False |
2535 rows × 14 columns
Analyse the Number of Police Killings over Time. Is there a trend in the data?
df_fatalities_by_year=df_fatalities.groupby(["date"])
df_fatalities_by_year=pd.DataFrame(df_fatalities_by_year["armed"].count()).reset_index().rename({'armed':'Count'})
df_fatalities_by_year
| date | armed | |
|---|---|---|
| 0 | 01/01/16 | 1 |
| 1 | 01/01/17 | 6 |
| 2 | 01/02/16 | 2 |
| 3 | 01/02/17 | 2 |
| 4 | 01/03/15 | 3 |
| ... | ... | ... |
| 874 | 31/08/16 | 2 |
| 875 | 31/10/15 | 2 |
| 876 | 31/10/16 | 3 |
| 877 | 31/12/15 | 1 |
| 878 | 31/12/16 | 3 |
879 rows × 2 columns
df_fatalities_by_year['date']=df_fatalities_by_year['date'].dt.strftime('%d/%m/%Y')
sns.displot(df_fatalities_by_year, x="date", kind="kde")
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) C:\Users\SHRUTI~1\AppData\Local\Temp/ipykernel_3976/1666060661.py in <module> ----> 1 df_fatalities_by_year['date']=df_fatalities_by_year['date'].dt.strftime('%d/%m/%Y') 2 sns.displot(df_fatalities_by_year, x="date", kind="kde") ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls) 179 # we're accessing the attribute of the class, i.e., Dataset.geo 180 return self._accessor --> 181 accessor_obj = self._accessor(obj) 182 # Replace the property with the accessor object. Inspired by: 183 # https://www.pydanny.com/cached-property.html ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\accessors.py in __new__(cls, data) 504 return PeriodProperties(data, orig) 505 --> 506 raise AttributeError("Can only use .dt accessor with datetimelike values") AttributeError: Can only use .dt accessor with datetimelike values
Now that you have analysed the data yourself, read The Washington Post's analysis here.